TransferBench v1.67.0#273
Open
nileshnegi wants to merge 2 commits intodevelopfrom
Open
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
Release PR for TransferBench v1.67.0, adding new presets and pod-aware multi-rank capabilities alongside build-system and usability improvements.
Changes:
- Adds multiple new presets (pod p2p/a2a, hbm, gfx/a2a sweeps, wallclock, smoketest, bmasweep) and expands preset/help/envvar UX.
- Introduces/extends pod detection/grouping utilities and uniformity checks across ranks.
- Modernizes build configuration (CMake + Makefile feature probes/flags) and updates docs/changelog for the release.
Reviewed changes
Copilot reviewed 30 out of 31 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
toolchain-linux.cmake |
Removes legacy CMake toolchain file (logic moved into CMakeLists). |
src/client/Utilities.hpp |
Updates rank grouping to use pod index, adds rank-per-pod map, adds uniformity helper/macros, and table sizing fixes. |
src/client/Topology.hpp |
Adjusts multi-rank topology display to show POD index and updated columns. |
src/client/Presets/WallClock.hpp |
Adds new wallclock preset for XCC wallclock consistency detection. |
src/client/Presets/Sweep.hpp |
Preset signature updated to include bytesSpecified. |
src/client/Presets/SmokeTest.hpp |
Adds smoketest correctness preset spanning DMA/GFX operations. |
src/client/Presets/Schmoo.hpp |
Preset signature updated to include bytesSpecified. |
src/client/Presets/Scaling.hpp |
Updates scaling preset to use CPU/GPU mem type env vars and deprecates USE_FINE_GRAIN. |
src/client/Presets/Presets.hpp |
Adds new presets, new preset listing output, and passes bytesSpecified into presets. |
src/client/Presets/PodPeerToPeer.hpp |
Adds pod-aware peer-to-peer bandwidth preset. |
src/client/Presets/PodAllToAll.hpp |
Adds pod-aware all-to-all preset with grouping/stride scheduling. |
src/client/Presets/PeerToPeer.hpp |
Preset signature updated to include bytesSpecified. |
src/client/Presets/OneToAll.hpp |
Preset signature updated to include bytesSpecified. |
src/client/Presets/NicRings.hpp |
Preset signature updated; minor numeric_limits fix; error message env var rename. |
src/client/Presets/NicPeerToPeer.hpp |
Preset signature updated; minor formatting/spaces; error message env var rename. |
src/client/Presets/Help.hpp |
Adds help preset describing transfer/config formats and examples. |
src/client/Presets/HealthCheck.hpp |
Preset signature updated to include bytesSpecified. |
src/client/Presets/HbmBandwidth.hpp |
Adds hbm preset to sweep/read HBM bandwidth with wallclock/event timing. |
src/client/Presets/GfxSweep.hpp |
Adds gfxsweep preset to sweep GFX kernel parameters for a transfer. |
src/client/Presets/EnvVarsList.hpp |
Adds envvars preset to print environment variable list. |
src/client/Presets/BmaSweep.hpp |
Adds bmasweep preset comparing DMA vs batched DMA executor. |
src/client/Presets/AllToAllSweep.hpp |
Refactors a2asweep output formatting and options (MEM_TYPE, NUM_SUB_EXECS, timing mode). |
src/client/Presets/AllToAllN.hpp |
Preset signature updated to include bytesSpecified. |
src/client/Presets/AllToAll.hpp |
Preset signature updated to include bytesSpecified. |
src/client/EnvVars.hpp |
Adds NIC CQ poll batch env var, expands env var listing, and adds string-array env parsing helper. |
src/client/Client.cpp |
Updates default CLI behavior and usage text to reference new help/envvars/presets commands and multi-rank usage. |
examples/example.cfg |
Updates documentation to include new executors (Batched DMA). |
Makefile |
Improves compiler detection, adds feature probes (NIC/MPI/POD/NVML/AMD-SMI), and clarifies build output. |
CMakeLists.txt |
Modernizes CMake (min version, ROCm detection, feature probes, options for NIC/DMA-BUF/POD/AMD-SMI, target linking). |
CHANGELOG.md |
Adds v1.67.00 release notes covering new presets/features and behavior changes. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
AtlantaPepsi
previously requested changes
Apr 27, 2026
Contributor
AtlantaPepsi
left a comment
There was a problem hiding this comment.
Separate cuMem compilation from pod enablement
e5d151b to
6249ec6
Compare
- Initial pod communication support (#235) - cuda + MNNVL update & pod presets (#241) - Increase CQ size for high qps (#244) - fix hang when NVML is present but fabricmanager isnt (#246) - Adding nica2a preset (#248) - Adding HBM read bandwidth preset (#250) - Pod Ring preset (#251) - gfxsweep preset (#254) (#256) - Adding Batched DMA support (hipMemcpyBatchAsync), and bmasweep preset (#255) - Adding a wallclock consistency detection preset (#258) - Adding smoketest preset for simple correctness tests (#266) - Help / envvars / presets presets (#267) - Modernize CMake build (#268) - Replace version-based pod/amd-smi detection with compile-time API probes (#269) - Fix collective mismatch hangs in multi-rank error paths (#270) - Fix SHOW_ITERATIONS table truncation with multiple transfers per executor (#271) - Reformat a2asweep output to match gfxsweep style (#272) - Gfx sweep update (#274) - Increasing flush frequency in smoketest (#275) - Adding new experimental copy-only GFX kernel, gfxsweep update (#277) - Fixes for cuMem compilation and invalid device ordinal (#278) - Simplifying socket connect, allow for using host address (#279) - Updating podring to run on single node without need to force single pod (#280) - Adding SHOW_PERCENTILES to show extra per-iteration statistics (#281) --------- Co-authored-by: Tim <43156029+AtlantaPepsi@users.noreply.github.com> Co-authored-by: Pak Nin Lui <pak.lui@amd.com> Co-authored-by: pierreantoineH <PierreAntoine.Harraud@amd.com> Co-authored-by: Nilesh M Negi <Nilesh.Negi@amd.com> Co-authored-by: Claude <claude@anthropic.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
6249ec6 to
29efe12
Compare
Contributor
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 32 out of 33 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+116
to
+118
| std::vector<std::vector<std::vector<uint64_t>>> results(numGpuDevices, | ||
| std::vector<std::vector<uint64_t>>(ev.numIterations, | ||
| std::vector<uint64_t>(numXccs, 0))); |
| int useDmaExec = EnvVars::GetEnvVar("USE_DMA_EXEC" , 0); | ||
| int useRemoteRead = EnvVars::GetEnvVar("USE_REMOTE_READ", 0); | ||
| int stride = EnvVars::GetEnvVar("STRIDE" , 1); | ||
| int groupSize = EnvVars::GetEnvVar("GROUP_SIZE" , numRanks * numDetectedGpus); |
Comment on lines
+121
to
+123
| if (numRanks * numDetectedGpus % groupSize) { | ||
| Utils::Print("[ERROR] Group size %d cannot evenly divide %d total devices from %d ranks.\n", groupSize, numRanks * numDetectedGpus, numRanks); | ||
| return ERR_FATAL; |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
TransferBench v1.67.0 release
Technical Details
Test Plan
Test Result
Submission Checklist